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^MULTI-TEST ANALYSIS OF REAL-TIME NUCLEIC ACID AMPLIFICATION 

Field of the Invention 

The present invention relates to a method of analyzing a sample for the 
presence of a nucleic acid. More particularly, the present invention is directed to an 
automated method for detecting and reporting the presence of a predetermined nucleic 
acid in a sample by analyzing data obtained during amplification of the nucleic acid. 



Background and Summary of the Invention 
10 Amplification of DNA by polymerase chain reaction (PGR) is a technique 

fundamental to molecular biology. Nucleic acid analysis by PGR requires sample 
preparation, amplification, and product analysis. Although these steps are usually 
performed sequentially, amplification and analysis can occur simultaneously. DNA dyes 
or fluorescent probes can be added to the PCR mixture before amplification and used to 
1 5 analyze PCR products during amplification. Sample analysis occurs concurrently with 
amplification in the same tube within the same instrument. This combined approach 
decreases sample handling, saves time, and greatly reduces the risk of product 
contamination for subsequent reactions, as there is no need to remove the samples from 
their closed containers for further analysis. The concept of combining amplification with 
20 product analysis has become known as "real time" PCR. See, for example, U.S. Patent 
No. 6,174,670, incorporated herein by reference. 

Monitoring fluorescence each cycle of PCR initially involved the use of 
ethidium bromide. Higuchi R, G Dollinger, PS Walsh and R. Griffith, Simultaneous 
amplification and detection of specific DNA sequences, Bio/Technology 10:413-417, 
25 1992; Higuchi R, C Fockler G Dollinger and R Watson, Kinetic PCR analysis: real time 
monitoring of DNA amplification reactions, Bio/Technology 1 1 : 1026-1030, 1993. In 
that system fluorescence is measured once per cycle as a relative measure of product 
concentration. Ethidium bromide detects double stranded DNA: if template is present 
fluorescence intensity increases with temperature cycling. Furthermore, the cycle 
30 number where an increase in fluorescence is first detected increases inversely 

proportionally to the log of the initial template concentration. Other fluorescent systems 
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have been developed that are capable of providing additional data concerning the nucleic 
acid concentration and sequence. 

While PCR is an invaluable molecular biology tool, the practical 
implementation of real time PCR techniques has lagged behind the conceptual promise. 
5 Currently available instrumentation generally does not actually analyze data during PCR; 
it simply acquires the data for later analysis. After PCR has been completed, multiple 
manual steps are necessary to analyze the acquired data, and human judgment is typically 
required to provide the analysis result. What is needed is a system for automating data 
acquisition and analysis so that no user intervention is required for reporting the 

10 analytical results. Thus, when the temperature cycling in a polymerase chain reaction 
amplification is complete, the system software is automatically triggered and the results, 
for example, the presence or absence of a given pathogen, are immediately displayed on 
screen. Algorithms for detection, quantification, and genotyping are needed. Moreover, 
initiation of the analysis algorithm can be implemented prior to completion of 

15 temperature cycling. Data processing can occur during amplification and concomitant 
analysis results can be used to modify temperature cycling and to acquire additional data 
during the latter stages of the amplification procedure to optimize amplification protocol 
and data quality. 

A major problem in automating PCR data analysis is identification of 
10 baseline fluorescence. Background fluorescence varies from reaction to reaction. 

Moreover, baseline drift, wherein fluorescence increases or decreases without relation to 
amplification of nucleic acids in the sample, is a common occurrence. Prior attempts to 
automate amplification data analysis involved setting the baseline fluorescence as that 
measured at one or more predetermined early cycle numbers. This technique accounts 
!5 for the variation in background fluorescence, but it does not compensate for baseline 

drift. Without compensation for baseline drift, automated amplification data analysis can 
easily provide both false negative and false positive results. 

Thus, a method of determining the presence of a nucleic acid in a sample 
is provided, the method comprising the steps of providing a fluorescent entity capable of 
U indicating the presence of the nucleic acid and capable of providing a signal related to the 
quantity ol the nucleic acid, amplifying the nucleic acid through a plurality of 
amplification cycles in the presence of the fluorescent entity, measuring fluorescence 
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intensity of the fluorescent entity at each of the plurality of amplification cycles to 
produce a fluorescent value for each cycle related to the quantity of the nucleic acid 
present at each cycle, obtaining a score from each of a plurality of tests, each of the 
plurality of tests using the fluorescence values to generate the score, and using the scores 
5 to ascertain whether the nucleic acid is present in the sample. In an illustrated 

embodiment, the tests comprise a Confidence Interval Test, and a Signal-to-Noise-Ratio 
Test. 

Additional features of the present invention will become apparent to those 
skilled in the art upon consideration of the following detailed description of preferred 
1 0 embodiments exemplifying the best mode of carrying out the invention as presently 
perceived. 

Brief Description of the Drawings 

Figs, la-1 show a comparison of three fluorescence monitoring schemes, 
15 (Figs, la, d, g, j) dsDNA dye, (Figs, lb, e, h, k) exonuclease probe, and (Figs, lc, f, L I) 
hybridization probe, for PCR amplification, wherein each scheme is illustrated (Figs, la- 
c) before amplification and (Figs, ld-f) after amplification, and fluorescence values are 
shown (Figs, lg-i) once during each cycle of PCR and (Figs, lj-1) continuously during 
PCR. 

20 Fig. 2 is a graph illustrating logistic growth. 

Figs. 3a-f show a comparison of various cycle-verses-fluorescence curve 

types. 

Fig. 4 illustrates a sliding window analysis for determining the slope of 
the fluorescence-verses-cycle number graph at each cycle. 
25 Fig. 5 shows typical fluorescence verses amplification cycle graphs for 

(A) a negative sample and (B) a positive sample. 

Fig. 6 also shows typical amplification graphs wherein (A) shows 
fluorescence verses amplification cycle. (B) is the first derivative of fluorescence verses 
amplification cycle, and (C) is the second derivative of fluorescence verses amplification 
30 cycle. 

lies. 7-1 1 show - the results for various samples wherein open white circles 
represent the fluorescence measurement at each cycle, open black circles represent the 
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first derivatives, closed black circles represent second derivatives, large black circles 
connected by lines represent the points contributing to the baseline calculation, and the 
horizontal lines illustrate the baseline region. Figs. 7 and 8 illustrate positive results, 
while Figs. 9-1 1 illustrate negative results. 
5 Fig. 12 shows the results for the seven-test analysis where the CallValue, 

or log(Score), is plotted against the number of samples. The (-1 , 1) interval for 
indeterminate calls is marked hy dotted lines 

Detailed Description of the Invention 
1° In describing and claiming the invention, the following terminology will 

be used in accordance with the definitions set forth below. 

As used herein, "nucleic acid," "DNA," and similar terms also include 
nucleic acid analogs, i.e. analogs having other than a phosphodiester backbone. For 
example, the so-called "peptide nucleic acids," which are known in the art and have 
15 peptide bonds instead of phosphodiester bonds in the backbone, are considered within 
the scope of the present invention. 

As used herein, "fluorescence resonance energy transfer pair" or "FRET 
pair" refers to a pair of fluorophores comprising a donor fluorophore and acceptor 
fluorophore, wherein the donor fluorophore is capable of transferring resonance energy to 
20 the acceptor fluorophore. In other words the emission spectrum of the donor fluorophore 
overlaps the absorption spectrum of the acceptor fluorophore. In preferred fluorescence 
resonance energy transfer pairs, the absorption spectrum of the donor fluorophore does 
not substantially overlap the absorption spectrum of the acceptor fluorophore. 

As used herein, k 'FRET oligonucleotide pair" refers to a pair of 
25 oligonucleotides, each labeled with a member of a fluorescent resonance energy transfer 
pair, wherein hybridization to complementary target nucleic acid sequences brings the 
fluorescent entities into a fluorescence resonance energy transfer relationship. 

The present invention is directed to a method of analyzing a sample for 
the presence of a nucleic acid wherein the sample is amplified, preferably using PGR, in 
30 the presence of a fluorescent probe capable of detecting the presence of the nucleic acid 
sample. In one embodiment, a baseline region is determined by comparing the 
fluorescence at various amplification cycles, and the fluorescence at each of various 
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amplification cycles is compared to the baseline region to determine whether the 
fluorescence measurements fall outside of that baseline region. In another embodiment, 
various tests are performed on the fluorescent data acquired during amplification, each of 
which test produces a numeric score. The scores are then used to determine a composite 
5 value, and a call is made based on that value. 

Many different probes have recently become available for monitoring 
PGR. Although not sequence specific, double stranded DNA ( dsDNA) specific dyes can 
be used in any amplification without the need for probe synthesis. Such dyes include 
ethidium bromide and SYBR™ Green L With dsDNA dyes, product specificity can be 

10 increased by analysis of melting curves or by acquiring fluorescence at a high 

temperature where nonspecific products have melted. Ririe KM, Rasmussen RP and CT 
Wittwer, Product differentiation by analysis of DNA melting curves during the 
polymerase chain reaction, Anal. Biochem. 245-154-160, 1997; Morrison TB, J&J Weis 
and CT Wittwer, Quantification of low copy transcripts by continuous SYBR Green I 

15 monitoring during amplification, BioTechniques 24:954-962, 1998. 

Oligonucleotide probes can also be covalently labeled with fluorescent 
molecules. Hairpin primers (Sunrise™ primers), hairpin probes (Molecular Beacons™) 
and exonuclease probes (TaqMan™) are dual-labeled oligonucleotides that can be 
monitored during PCR. These probes depend on fluorescence quenching of a 

20 fluorophore by a quencher on the same oligonucleotide. Fluorescence increases when 
hybridization or exonuclease hydrolysis occurs. 

An illustrated probe design employs two oligonucleotides, each labeled 
with a fluorescent probe. Hybridization of these oligonucleotides to a target nucleic acid 
brings the two fluorescent probes close together to allow resonance energy transfer to 

25 occur. Wittwer CT, MG I lerrmanm AA Moss and RP Rasmussen. Continuous 

fluorescence monitoring of rapid cycle DNA amplification, BioTechniques 22:130-138, 
1997. These hybridization probes require only a single fluorescent label per probe and 
are easier to design and synthesize than dual labeled probes. Acceptable fluorophore 
pairs for use as fluorescent resonance energy transfer pairs are well known to those 

30 skilled in the art and include, but are not limited to, fluorescein rhodamine. 

phycoerythrin Cy7. fluorescein. Cy5, fluorescein. Cy5.5. fluorescein LC Red 640. and 
fluorescein, LC Red 705. Donor-quencher FRL:T oligonucleotide pairs may also be 
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employed, wherein fluorescence of the donor fluorophore is quenched by the quencher 
fluorophore when the two fluorescent probes are brought close together. It is understood 
that when donor-quencher FRET oligonucleotide pairs are used, the fluorescence values, 
and hence all maximum and minimum values, will be the inverse as described below. 
5 Another type of hybridization probe, a "single-labeled oligonucleotide 

probe," employs an oligonucleotide probe wherein each probe is constructed of a single 
oligonucleotide and a single fluorescent dye. The oligonucleotide probes are constructed 
such that hybridization of the probe to a target sequence affects the fluorescent emission 
of the fluorescent dye. Single-labeled oligonucleotide probes may employ various probe 
10 designs. In one design, hybridization of the probe to the target sequence places the 
fluorescent dye in close proximity to a guanine residue, with resultant quenching of 
fluorescent emission. In another embodiment, the fluorescent entity replaces a base in 
the oligonucleotide probe structure, and upon hybridization this "virtual nucleotide" is 
placed in a complementary position to a G residue, with resultant quenching of 
15 fluorescence. In other embodiments, probes are constructed such that hybridization 

results in an increase in fluorescent emission. In one such embodiment, the fluorescent 
entity is attached to a G residue, with increased fluorescence upon hybridization. Further 
information on single-labeled oligonucleotide probe design is found in U.S. Patent 
Application No. 09/927,842, filed August 10, 2001, herein incorporated by reference. As 
20 with the donor-quencher FRET oligonucleotide pairs, when fluorescent quenching 

indicates hybridization, the fluorescence values, and hence all maximum and minimum 
values, will be the inverse as described below 

SYBR™ Green I, exonuclease probe, and hybridization probe designs are 
shown in Figs. la-1. For each design, schematics both before (Figs, la-c) and after (Figs. 
25 Id- ft amplification are shown, as well as cycle verses fluorescence amplification plots of 
positive and negative controls (Figs, lg-i), and temperature verses fluorescence plots 
from continuous monitoring (Figs. lj-H. SYBR Green I fluorescence increases as more 
JsDNA is made ( Figs. la. d. g. j ). Because the dye is not sequence specific, a negative 
control also increases in fluorescence during later cycles as primer dimers are formed. In 
30 Figs. lb. e. h. k. dual-labeled fluorescein rhodamine probes are cleaved during 

polymerase extension by 5'-exonuclease activity, separating the fluorophores and 
increasing the fluorescein emission. The signal generated is cumulative and the 
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fluorescence continues to increase even after the amount of product has reached a 
plateau. Figs, lc, f 3 i, 1 show use of a FRET oligonucleotide pair wherein two probes 
hybridize next to each other, one labeled 3' with fluorescein and the other labeled 5' with 
Cy5. As product accumulates during PCR, fluorescence energy transfer to Cy5 increases. 
5 The fluorescence of hybridization probes decreases at high cycle number because of 
probe/product competition. 

Standard instruments for PCR complete 30 cycles in about two to four 
hours. A preferred system is a rapid thermal cycling device using capillary tubes and hot 
air temperature control. See, for example, United States Patent No. 5,455,175, herein 

1 0 incorporated by reference. Because of the low heat capacity of air and the thin walls and 
high surface area of capillary tubes, small volume samples could be cycled quickly. The 
total amplification time for 30 cycles is reduced to 15 minutes with excellent results. 

The use of capillaries with forced air heating allows precise control of 
sample temperature at a speed not possible with other designs. For example, sample 

1 5 temperature verses time plots in capillaries show sharp spikes at denaturation and 

annealing temperatures, whereas several seconds are required for all of the sample to 
reach equilibrium in conical plastic tubes. Wittwer, CT, GB Reed and KM Ririe, Rapid 
cycle DNA amplification, in K Mullis, F Ferre, and R Gibbs (Eds.), The polymerase 
chain reaction, Springer- Verlag, Deerfield Beach, FL. pp. 174-181, 1994; Wittwer, CT, 

20 BC Marshall, GB Reed, and JL Cherry, Rapid cycle allele-specific amplification: studies 
with the cystic fibrosis delta F508 locus, Clin. Chem.. 39:804-809, 1993. Rapid 
temperature cycling with minimal annealing and denaturation times improves 
quantitative PCR and increases the discrimination of allele specific amplification. Weis, 
JH, SS Tan, BK Martin, and CT Wittwer, Detection of rare mRNA species via 

25 quantitative RT-PCR, Trends in Genetics, 8:263-4, 1992; Tan ST and JH Weis, 

Development of a sensitive reverse transcriptase PCR assay, RT-RPCR, utilizing rapid 
cycle times, PCR Meth. and Appl. 2:1 37-143. 1992. Rapid cycling for cycle sequencing 
reduces sequencing artifacts and minimizes "shadow banding" in dinucleotide repeat 
amplifications. Swerdlow H, K Dew-Jager and RF Gesteland, Rapid cycle sequencing in 

30 an air thermal cycler. Bio Techniques 1 5:5 12-519. 1903: Odelberg SJ and R White. A 
method for accurate amplification of polymorphic CA-repeat sequences. PCR Meth. 
Appl 3:7-12. l l >93. For long PCR. yield is improved when the sample is exposed as 
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little as possible to high denaturation temperatures. Gustafson CE, RA Aim and TJ 
Trust, Effect of heat denaturation of target DNA on the PCR amplification. Gene 23:241- 
244, 1993. The RapidCycler®, developed by Idaho Technology, is an example of a rapid 
thermal cycling device. The LightCycler® (Roche Diagnostics, Indianapolis, IN) is a 
5 rapid temperature cycler with a fluorimeter, wherein light emitting diodes are used for 
excitation and photodiodes are used for detection. 

The present invention is directed to methods for automating detection 
nucleic acids with real time PCR. While these algorithms may be applied to any 
amplification system, in one embodiment these algorithms are integrated into the 

1 0 LightCycler® platform. These analysis routines are triggered by the completion of rapid 
thermal cycling for "hands off amplification, analysis, and final results presentation in a 
total of less than 1 5 min. The analysis routines take from <1 second for detection and 
quantification to <10 seconds for genotyping. LabView (National Instruments, Austin, 
TX), a graphical programming language, is preferred for LightCycler® instrument 

15 control. The LightCycler® is a PC-based instrument. The LightCycler® may be 
packaged in a portable format for field use. 

Perhaps the most basic analysis of real time PCR data is a judgement of 
whether a targeted nucleic acid is present. If the nucleic acid is present, further 
quantification and genotyping may take place. In many cases, a yes/no judgement is all 

20 that is needed. For example, one may want to determine whether E. coli 0157:H7 is in a 
sample of hamburger, whether anthrax is present in a suspicious white powder; or 
whether hepatitis C is in a unit of blood. Real time PCR can improve yes/no detection 
over end point PCR assays because fluorescence is acquired at each cycle. 

Inspection of cycle verses fluorescence data from positive and negative 

25 real time PCR runs (see Figs, lh and li) suggests that discrimination is simple. The 
positive samples increase with cycle number while the negative samples remain at 
baseline. A trained observer expects positive samples to follow an S-shape curve, 
beginning with a baseline, followed by an exponential segment, and finishing with a 
plateau. The expected curve is similar to the logistic model for population growth, where 

30 the rate of growth is proportional to both the population size y and to the difference L-y, 
where F is the maximum population that can be supported. For small y. growth is 
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exponential, but as y nears L the growth rate approaches zero. An example of logistic 
growth is shown in Fig. 2. 

Although intuitively simple, accurately discriminating between positive 
and negative samples is not easy in practice. The simplest approach is to set a horizontal 
5 fluorescence threshold as a discriminator between positive and negative samples. This 
works best with a stable baseline (between and within samples) and a known 
fluorescence intensity that correlates with "positive." Although this method will work on 
obvious samples (e.g. Figs, lh and li), a more robust algorithm is desired that will work 
under a wider variety of conditions. For example, the baseline may drift and the 
10 fluorescence intensity may vary greatly between different samples and probe techniques. 
Thus, the present invention is directed to a method that will: (1) automatically identify 
the baseline, (2) use the baseline variance to establish a confidence region, and (3) call 
each sample positive or negative based on the relationship of the confidence region to the 
fluorescence data. 

1 5 Figs. 3a-f display various types of amplification curves, all of which have 

been observed in LightCycler® runs. Figs. 3a and b show curves from samples that are 
negative with no template present. The fluorescence scales in Figs. 3a and b are 
magnified (compared to Figs. 3c-f) to demonstrate the baseline drift and to provide 
algorithms capable of being independent of the fluorescence intensity. There is always 

20 some baseline drift during cycling. This drift usually is greatest at the beginning of 

cycling but later levels off, and may be either downward (Fig. 3a) or upward (Fig. 3b). 
This baseline drift of negative reactions must be distinguished from positive reactions of 
either low copy numbers (Fig. 3c) or high copy numbers (Fig. 3d) of starting template, 
the method needs to work with various probe designs, including exonuclease (Fig. 3e) 

25 and hybridization (Fig. 3f) probes. 

Automatic identification of the background is surprisingly difficult. In 
prior art methods, the baseline is determined as a function of measured fluorescence at a 
fixed range of cycles near the beginning of amplification. However, selection of a fixed 
range of cycles is not adequate because both downward drift (Fig. 3a) and high copy (Fig. 

30 3d*) amplifications may be incorrectly called. 
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Confidence Band Analysis 

In one embodiment of the present invention, the background is identified 
by analyzing the fluorescent measurements over a wide range of amplification cycles. 
Preferably, the background is identified by selecting the sliding window (Fig. 4) with the 
5 shallowest slope. That is, calculate the slope at each cycle by linear regression of the 

local neighborhood (for example, a 7 point sliding window). The window with the slope 
of lowest absolute value (least difference from zero) defines the background region. 
Once the background region has been identified, the variation of these background points 
about their regression line (the square root of the mean square error ) is multiplied by a 

10 constant to determine a confidence band. This confidence band will have a slope near 
zero and is extrapolated across all cycles. If the fluorescence of the last cycle is within 
the confidence band it is negative, if it is outside the band it is positive. Fig. 5 
demonstrates both cases. 

This algorithm should work well in most cases. However, with the high 

15 copy fluorescence curve type (Fig. 3d), the shallowest slope might be found at early 
cycles (resulting in a correct positive call) or at late cycles (resulting in an incorrect 
negative call). This exception may be handled by analyzing the curve shape. In a well- 
behaved amplification, the expected amplification curve shape is ordered by cycle 
number as follows: 

20 

1 . Minimum fluorescence 

2. Maximum second derivative (F") 

3. Maximum first derivative (F') 

4. Minimum second derivative (F") 
25 5. Maximum fluorescence 

This gives the characteristic S-curve shape expected during PGR (Fig. 6A). The 
maximum slope (first derivative) is obtained from the sliding window analysis already 
performed for background identification. Preferably, the second derivatives are 
30 calculated by a 3-point sliding window linear regression of the first derivatives. If the 
curve shape is well behaved (that is, if looking at a graph of Fig. 6, and reading from 
lowest to highest cycle number, the features occur in the order listed above), then the 
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background is only selected from sliding windows centered at cycle numbers less than 
the second derivative maximum. This solves the potential analysis problem with Fig. 3d. 
In other preferred embodiments, cycle numbers less than the first derivative maximum or 
cycle numbers less than the second derivative minimum may be used. It will be further 
5 understood that any cycle number between the second derivative maximum and the 
second derivative minimum is a suitable cutoff cycle for use with this technique and is 
within the scope of this invention. 

Another method is to compare the cycle with the greatest fluorescence 
(which is not necessarily the last cycle) to the confidence band. This is especially suited 
10 for hybridization probes that may decrease in fluorescence with extensive cycling, such 
as seen in Fig. 3f. The cycle with the greatest fluorescence only should be used when the 
curve shape is well behaved, in order to prevent false positive calls with downward 
drifts, such as shown in Fig. 3a. 

The variables to optimize for automatic detection are: 1) the window size 
15 for the first derivative estimate, 2) the window size for the second derivative estimate, 
and 3) the confidence band factor. A reasonable value for the first derivative window 
size is 7, although 3, 5, 9, and 1 1 are also quite useful. For the second derivative the 
preferred window size is 3, but 5, and 7 have also proven to be useful values. A 
preferred confidence band factor is 20, As the first derivative window size increases the 
20 variance estimate is more accurate, but the edge cycles (beginning and ending) are lost. 

This algorithm is best understood by referring to the fluorescence verses 
cycle test result plot shown in Figs. 7-11. The input data consist of one fluorescence 
value for each cycle of amplification, shown as the closed white circles. Let this equal 
array Yi, where i is the cycle number and N is the total number of cycles. The detection 
25 criteria are: 

A = the number of fluorescence values used to determine the first derivatives. It 
is convenient to use odd numbers, so that the first derivatives correspond to 
integer cycle numbers. As discussed above, reasonable values include 3. 5. 7, 9, 
^0 and 1 1 . Preferably. 7 is used as the first derivative window size. 
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B - the number of first derivative values used to determine the second 
derivatives. Again, it is convenient to use odd numbers, so that the second 
derivative values also correspond to integer cycle numbers. Reasonable values 
include 3, 5, and 7, with 3 being the preferred value. 

5 

C - the confidence band factor. This factor determines the confidence band by 
multiplying it by a variance measure, preferably the square root of the mean 
square error. 

10 The first step is to calculate the first and second derivatives. Although 

there are many ways to accomplish this, a preferred method is to determine the first 
derivatives as the slope of a linear regression line through A points, and assigning the 
value to the central cycle number. Some cycles on either edge cannot be assigned first 
derivatives, but first derivatives can be provided for cycles ( A + l)/2 through N-(A - l)/2. 

15 Similarly, the second derivatives are calculated as the slope of the first derivative points 
and assigned to cycles (A + l)/2 + (B - l)/2 through [N - (A - l)/2] - (B - 1 )/2. 
Calculation of the first and second derivatives provide arrays Y'i and Y"i, with some 
edge values missing. In Fig. 7, the first and second derivatives are displayed as open 
black circles and closed black circles, respectively. 

-0 The next step is to determine whether the fluorescence curve has a well- 

behaved shape. As discussed above, the well-behaved shape occurs when the cycles with 
minimum fluorescence, maximum second derivative, maximum first derivative, 
minimum second derivative, and maximum fluorescence occur in that order, from low to 
high cycle number. 

!5 The baseline is then determined. If the fluorescence curve does not have 

the expected shape, the cycle whose first derivative is closest to zero is used. If the 
fluorescence curve has a well-behaved shape, the cycle whose first derivative is closest to 
zero chosen from among all cycles prior to the cycle with the maximum second 
derivative (again, any cycle between the maximum second derivative and the minimum 

0 second derivative may also be used as the cutoff cycle number ). The baseline is drawn 
through the fluorescence value of the chosen cy cle with a slope of its first derivative. In 
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Fig. 7, the A points contributing to the first derivative calculation for the baseline are 
displayed as large black dots connected by a line. 

The next step is to determine the test point cycle, that is, the cycle used to 
compare against the baseline for determining a positive or negative result. If the curve is 
5 not well-behaved, the test point is the last cycle. If the fluorescence curve is well- 
behaved, the test point is the cycle with fluorescence farthest from the baseline. The test 
point fluorescence of a negative sample can be predicted as the intersection of the 
baseline with the test point cycle. 

Next, a confidence interval can be determined about the predicted 
10 negative test point. Preferably, this is done by finding the square root of the mean square 
error about the baseline of A points used to determine the baseline. This is multiplied by 
C. The product is added to the predicted negative test point to get the upper fluorescence 
limit of the confidence interval and is subtracted from the predicted negative test point to 
get the lower limit of the confidence band. These limits are shown on Fig. 7 as two solid 
15 horizontal lines. 

The final step is to declare the sample positive or negative. If the test 
point fluorescence is outside of the confidence interval, the sample is positive. If it is 
within the interval, the sample is negative. Figs. 7 and 8 are samples which are positive, 
while Figs. 9-1 1 are negative samples. 

20 

Multi- Test A n a lysis 

A further approach to automated analysis of real-time nucleic acid 
amplification is to use algorithms that employ one or more tests to obtain an aggregate 
score that defines, with higher accuracy and robustness, whether the sample is positive, 

25 negative, or indeterminate. A test similar to the Confidence Band Analysis is employed, 
except that the test produces a value, instead of a positive or negative call. 

High accuracy is obtained if at least one additional test is employed, and 
preferably if four additional tests are employed, most preferably if six additional tests are 
employed in addition to the confidence interval test. Each of the tests produce a score. 

- ;() T ; . |\ T :i . The overall composite score for each sample is calculated by the following 

formula: 
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5 

in which numbers P }> P 2 , P n are predetermined correction factors for each test, and 
Threshold is a predetermined score threshold that provides a convenient dividing value 
between negative and positive calls. Ranges are chosen for definitively "positive" and 
definitively "negative" calls, and for the "indeterminate" or "unable-to-call" calls. If 
1 0 Score is used directly to set these ranges, a negative sample will have a value between 0 
and 1, a positive sample will have a value greater than 1, and a decision is made about 
how much of those two regions need to be carved out as the "indeterminate" region. A 
more convenient way to choose the ranges i s to use the logarithm of Score, where 
CallValue is equal to log(Score): 

15 

CallValue = ^P t log T l - log{Threshold) 



By the taking the logarithm of Score, a negative sample will now have a negative value 
and a positive sample wall have a positive value. The logarithm also makes the meaning 

20 of Threshold easier to understand as it simply shifts the values either more negative or 
more positive. The indeterminate region can be chosen, for example, as being between - 
1 and 1. and definitive positives and negatives can be placed outside of that region. 
Again, taking the logarithm of Score is not essentia! for the invention, but it is shown 
here as a convenient way of describing the process. 

25 Described below are individual tests that can be used to provide the 

composite Score and the Calll'alue. The mathematical definitions applied to the 
individual tests that produce individual scores T, from the fluorescence signals should be 
taken as examples only, and it is understood that other mathematical definitions can be 
used. Alternative mathematical definitions may produce different T values, in which 

30 case, both the correction factor P. and the Threshold may have to be re-assigned 
appropriately using the teachings described herein. 
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Test 1: Signal-to-Noise Ratio Test 

This test measures the ratio between what is considered signal and what is 
considered noise. One way to do this is to take the ratio between the total change in 
fluorescence and the sum of absolute fluorescence change seen each amplification cycle. 
5 When the overall fluorescence is increasing with cycle number, the definition of the test 
is 



where F J represents fluorescence measurements from the instrument. The subscript 
10 represents the amplification cycle and runs from one to the total number of cycles. A 
short window of cycle numbers (2m) is interrogated { for instance 2m=6), and k is the 
range variable, or midpoint of the window. The first cycle number in any given window 
will be k-m, and the last cycle number k+m. When overall fluorescence is decreasing 
with cycle number, the definition of the test is not applied. The value of this test is 
15 greater than or equal to one. T x is one if fluorescence increases at each successive cycle 
within the range of 2m. If there is noise, and fluorescence decreases between one or 
more cycles, then T ] will be greater than one. The main purpose of this test is to make a 
qualitative assessment of negative samples, although if this test alone is employed, one 
can be fooled by fluorescence curves with a rising baseline. It should be understood that 
20 there are other ways to assess Signal-to-Noise, and the aforementioned method is meant 
as an example of one such method. High accuracy in automated analysis may be 
obtained by using the Signal-to-Noise Test in combination with the Confidence Interval 
Test discussed below. 

25 Test 2: Confidence Interval Test 

This lest is essentially the Confidence Band Analysis discussed above, in 
which a baseline segment of the fluorescence curve is Jvnamieallv established as a 




F k - m I) 
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confidence interval or confidence band, and the algorithm ascertains whether the 
fluorescence value during a selected amplification cycle is inside or outside the 
confidence band. The difference is that the above Confidence Band Analysis produces a 
positive or negative call, while this Confidence Interval Test produces a value. This 
5 Confidence Interval Test and the Signal-to-Noise test are illustratively used together to 
generate composite scores. One mathematical method to score this test is to first fit a 
line to the curve using linear regression and the sum of the residuals squared is computed 
from the line. The residual is normalized to a predetermined value called the NoiseLevel. 



NoiseLevel will be dependent on the instrumentation that is used to monitor fluorescence 
1 5 as the reaction proceeds. For the LightCycler® instrument NoiseLevel = 0.05. The value 
of T 2 is large for positive samples and close one for samples that are noise dominated. 
Therefore, this test identities positive signals, but can miss low amplitude positive 
signals. As with all other tests, there are other ways to mathematically describe the 
Confidence Interval Test, and it should be understood that those will also work in this 
20 invention. 

Test 3: Channel Consistency Test 



consistent with the expected pattern for positive amplification reactions. The precise 
25 form of this test depends on the design of the detection channels and the specific reporter 
chemistry that is used to provide fluorescence signal that reflects the quantity of nucleic 
acid. While fluorescence is usually monitored by a primary detection channel that is 
most suited to recognize the reporter dye. in most multi-channel detection devices it is 
possible to monitor the signal in other channels and to establish the expected input 
30 characteristic that these secondary channels ) should receive in a problem-free positive 



If the linear fit is defined as L(j) = Aj + B where j is the cycle number, 



then the test is defined as 




This test measures whether the data across multiple detection channels are 
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amplification reaction. For instance, if a secondary channel is capable of receiving the 
emission from the reporter dye, we expect the maximum second derivative value in this 
channel to be the same as in the primary channel. We may also expect the fluorescence 
intensity in the secondary channel to be specifically lower than the primary channel. In a 
5 situation where fluorescence from a contaminant interferes with all channels, the 

expected difference in fluorescence intensity between channels may not be observed. By 
observing the fluorescence in one or more secondary channels, a reaction that would be 
otherwise called positive in the primary channel will be flagged as aberrant. In another 
example, if a secondary channel is capable of receiving the emission of a donor dye, 

1 0 rather than the reporter dye, a decrease in emission signal may be observed during 
amplification, and here, the second derivative minimum, not the maximum, of the 
secondary channel should be equal to the second derivative maximum of the primary 
channel. Whatever the expected pattern is for the positive sample, if data from multiple 
channels fall within tolerance for the expected pattern, then T 3 = 4 / 3 , and if not, then T 3 = 

15 V 4 . 



Test 4: Efficiency Test 

This test measures the efficiency of PCR reaction as measured by the 
fluorescence curve. It assumes that PCR should be modeled with saturation. The 
20 simplest appropriate fluorescence saturation model is 



F„ Tj - F„ - A F n (max(F) - Fj. 

Then the transformation 

log F- log (max(F)-F) ^ A j *- B 



3o 



is linear in the cycle number. Using this model, the efficiency is equal to I- A. The test 
itself is defined as 

J\ ~ 1 ~ niaxjO.A) 
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where A is determined by fitting the curve to a three part function defined by 

log Fj - log (max(F)-F) = c } when j <j } 

5 log Fj - log (max(F)-F) = Aj + B when j l <j< j 2 

log Fj - log (max(F)-F) = c 2 when j 2 < j 

where j 2 - j } is required to be at least seven cycles. The unknowns A, B, c } and c 2 are 
10 chosen to minimize the sum of the residuals squared over the fluorescence curve. 

The value of T 4 is larger for positive samples, which have high efficiency, 
than for negative samples, which have low efficiency. Therefore, this test distinguished 
positive from negative samples. For high accuracy automated calling, it is effective to 
use this test together with the Channel Consistency, the Signal-to-Noise Ratio and the 
1 5 Confidence Interval Tests. 

Test 5: Function Ordering 

As discussed above in the Confidence Band Analysis, a well-behaved 
amplification curve has a characteristic s-shape or sigmoidal shape. This test measures 
20 whether the fluorescence curve has the sigmoid shape expected of a sample that has been 
amplified. The test determines whether the fluorescence curve satisfies the ordering 
relationship that is a characteristic of sigmoidal curves, namely that 

min.(F ; ) - max,(F H - 2F. + F Ml ) - max,(F.. ; - F,.,) - max.(F,). 

25 

The symbol - is used to denote the ordering of the features with respect to the cycle 
variable/ However, unlike the Confidence Band Analysis discussed above, the 
minimum second derivative is omitted, as some positive samples do not satisfy the 
ordering with the minimum second derivative included. If the relationship is satisfied. 
30 then T f ~ 4 3 . and if the relationship is not satisfied, then 7' 5 = 3 4. Therefore, this test is 
useful in distinguishing positive from negative samples. However, it can be fooled hv 
some negative samples. Thus, as with each of the tests, it is preferable to use this test in 
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combination with other tests. 

Test 6: Maximum to Baseline Comparison Test 

This test measures the change in the fluorescence curve relative to the 
5 baseline of the curve. The test fits and then subtracts a linear baseline from the curves. It 
then identifies the background cycles of the curve and calculates the maximum 
fluorescence in that region. From this calculation, the test is 

T 6 = max/^) / max^Jj F J |) 

10 

where the fluorescence values used have the background for the curve subtracted out. 
The value of T 6 is large for positive samples and near one for samples that are noise 
dominated. Therefore, this test identifies positive signals, but the baseline is difficult to 
determine accurately, and therefore, can miss some positive samples. 

15 

Test 7: Late Rise Test 

This test measures the change in the fluorescence curve over the last three 
to five cycles. The test fits a line to the last three through five cycles of the curve using 
linear regression. 

20 If the linear fit is defined as LQ) - A m j + B where j is the cycle number 

and m is the number of points used to determine L(j), then the test is defined as 

T- = 1 t- max,,, (0,A (m ') 

25 The value of T 7 is larger than one for samples that have a positive slope 

over the last few cycles, and is equal to one otherwise. Therefore, this test useful in 
identifying late rising positive signals. It is also conceivable for the algorithm to 
automatically add extra amplification cycles if the sample is ascertained to have a late- 
rising positive signal, and further optionally, to obtain the melting temperature to verify 

30 the identity of the product by either continuous monitoring during amplification, or 
adding a melting analysis step after amplification. 

for high accuracy in automated determination of amplified material, it is 
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preferred to use all seven tests. 

Determining the correction factor and threshold 

The correction factor P t and the Threshold used in the final formula are 
5 found using numerical optimization. This process can be generalized as follows: first, a 
desired range is set for "positive," "negative," and "indeterminate" calls using Score or a 
mathematical manipulation such as Call Value (log(Score)). In the case of CallValue, an 
illustrated example uses (-1, 1) for the indeterminate range, >1 for positives, and <-l for 
negatives, but it should be understood that the ranges could be set in a variety of different 

10 ways. Once the ranges are set, then parameters P t and Threshold are optimized to 
produce as many correct calls as possible and to minimize incorrect calls. The 
optimization preferably is performed using a large set (for example, about 4000) of 
amplification plots, about a third of which are PCR reactions chosen for being 
particularly difficult to classify based on the Confidence Band Analysis alone, another 

1 5 sixth being reactions that are easy to classify, another third from plots created with a 

Gaussian random number generator (mean=0, variance=0.05 which are based on typical 
fluorescence noise levels), and remainder generated by saturating curves constructed 
from the function 

20 F = Ce m 7(l+Ce mt ). 

The parameters m and C are generated using uniform random number generators. 

The objective function that is optimized is the weighted sum of three 
terms: the first term being the number of predicted calls that disagreed with the known 

25 classification of the samples, the second term being the number of correct calls in the 
unable-to-call or "indeterminate" category, and the third term being the number of 
incorrect calls outside of the unable-to-call category. This function is designed to 
produce as many correct calls as possible, decrease the number of correct calls in the 
unable-to-call region and decrease the number of wrong calls outside of the unable-to- 

30 call region. The relative tolerance for false-negative or false-positive calls is determined 
by the weighting of the three terms. 
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Example of the Two-test Analysis 

With two tests, the Signal-to-Noise Ratio test {T } ) and the Confidence 
Interval Test ( T 2 ) are preferably used. Optimization of the parameters P, and Threshold 
are shown here, as example, using the CallValue. The CallValue from the two tests is 
5 given by 



CallValue =P X \ogT { + P 2 \ogT 2 - \og(Threshold) 

The expected value for the Signal-to-Noise Ratio Test {T } ) is one for a positive sample 

10 and is more than one for a negative sample. The expected value of the Confidence 

Interval Test ( T 2 ) is one for negative samples and more than one for positive samples. As 
log T x will be a positive number for negative samples, P ] should be negative if CallValue 
is to be a negative number for negative samples. Similarly, P 2 should be positive. 
Threshold is expected to be near one for this example because one is the divide between 

1 5 positive and negative samples in T v and T 2 . 

To perform the optimization, guesses for the parameters are made. 
CallValue is then calculated for every sample, and it is determined whether the calls 
made using CallValue are correct or incorrect. The number of incorrect calls is then 
counted. This is the first term of the sum. The number of correct calls in the interval (- 

20 1,1) and the number of incorrect calls outside of the interval (-1,1) are counted, and those 
counts are each divided by 10 to generate the second and third terms, which by way of 
example, are given less weight. The three terms are added and the sum is assigned as the 
value of the objective function. Nearby values in the parameter space of the correction 
factor are then used to make the objective function smaller. The process is repeated until 

25 the value of the objective function cannot be made smaller. Using this process, P , has a 
range of -6 to -4. P : a range of 0.5 to 1 .0, and the Threshold 1 .5 to 2.0 for the illustrated 
example. Using the same process, the P, and Threshold values for analysis methods that 
combine more than two tests can also be determined. Fable 1 shows these values using 
the illustrated examples. 



30 
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3. Channel Consistency 


N/A j 


1.0 to 1.5 


1.0 to 2.0 


4. Efficiency 


N/A 


3.5 to 4.0 


4.0 to 5.0 


5. Function Ordering 


N/A 


N/A 


1.0 to 1.5 


6. Maximum to Baseline 


N/A 


N/A 


2.0 to 3.0 


7. Late Rise 


N/A 


N/A 


2.0 to 3.0 










Threshold for Test 


1.5 to 2.0 


3.0 to 4.0 


4.5 to 5.5 



m 



Accuracy of Automated Calls By The Seven-Test Analysis 

10 The seven-test analysis, which combines all seven tests, was performed on 

2005 reactions, of which 1273 were previously classified as indeterminate based on the 
Confidence Interval Test alone, and 732 were considered easy to call. Based on the 
known classification of the reactions, 1988 ( 99.2%) were correctly called by the seven- 
test analysis. Out of the 17 (0.8% ) that were incorrectly called, 13 (or 76% of incorrects) 

15 fell within the interval (-1, 1 ). Therefore, the combined test can distinguish between 
positives and negatives more robustly than the Confidence Interval Test alone. This 
result is illustrated in the bimodal distribution of the scores (Fig. 12). 

The programming language Mathlab®, from Math Works, Inc., was used 
for this example. However, any suitable programming language can be used. 

20 Here, again, the combination tests may be further combined with an 

automatic melting temperature (Tm) analysis to confirm the identity of amplified 
product. As described above, Tm information can be acquired through continuous 
monitoring of fluorescence during amplification reactions, or by an additional melting 
step performed post amplification. 



Melting Temperature Analysis 

In another embodiment, the "positive" calls generated by the above 
method are further confirmed by automatic feedback of the melting temperature (Tm) 
value of the amplified product. This additional confirmation is possible as long as the 
hybridized and non-hybridized states of the probe can be distinguished by changes in 
fluorescence signal, as with dsDNA dyes and hybridization probes. The Tm of an 
amplified product can be determined as follows: at a predetermined and or dynamically 
chosen amplification cycle, fluorescence is monitored continuously between extension 
and denaturation (or annealing and denaturation. in the case of a two-step amplification 
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process). This monitoring will provide a melting profile of the amplified product. 
Alternatively, a Tm can be obtained by adding a separate melting process at the end of 
the amplification cycle, during which fluorescence is continuously monitored and a 
melting profile is obtained. The minimum (or maximum, depending on whether the 
5 probe design produces a melting peak/valley), of the derivative of this melting profile 
will determine the Tm. The Tm value will then be compared with the known Tm of the 
target analyte, and if the two values are in concordance, a verified positive call is made. 
If they are discordant, then a "positive" call is not verified. This technique may be used, 
for example, to identify situations where a locus other than the target locus was amplified 
10 or where primer dimers were produced. 

Although the invention has been described in detail with reference to 
preferred embodiments, variations and modifications exist within the scope and spirit of 
the invention as described and defined in the following claims. 



